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Abstract 

Finding correspondences in wide baseline setups is a 
challenging problem. Existing approaches have focused 
largely on developing better feature descriptors for corre¬ 
spondence and on accurate recovery of epipolar line con¬ 
straints. This paper focuses on the challenging problem 
of finding correspondences once approximate epipolar con¬ 
straints are given. We introduce a novel method that in¬ 
tegrates a deformation model. Specifically, we formulate 
the problem as finding the largest number of corresponding 
points related by a bounded distortion map that obeys the 
given epipolar constraints. We show that, while the set of 
bounded distortion maps is not convex, the subset of maps 
that obey the epipolar line constraints is convex, allowing 
us to introduce an efficient algorithm for matching. We fur¬ 
ther utilize a robust cost function for matching and employ 
majorization-minimization for its optimization. Our experi¬ 
ments indicate that our method finds significantly more ac¬ 
curate maps than existing approaches. 

1. Introduction 

Finding point correspondences in image pairs of a static 
scene is a classical problem in stereo and structure from 
motion (SFM). Finding correspondences in wide baseline 
setups, i.e., when the cameras’ focal centers are distant, is 
particularly challenging. Images obtained in such setups are 
generally subject to significant distortion and their content 
may differ substantially also due to occlusion. 

The problem of wide baseline stereo matching has re¬ 
ceived significant attention in recent years (see a brief re¬ 
view in Section 2). Existing approaches have focused 
largely on developing better feature descriptors for corre¬ 
spondence and on accurate recovery of epipolar line con¬ 
straints. However, although challenging, the problem of 
finding correspondences once the epipolar geometry has 
been estimated has not yet received sufficient attention. 

In this paper we introduce a novel method for finding 
correspondences in wide baseline image pairs of a static 
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scene. Noting that matching is often ambiguous even when 
epipolar constraints are taken into account, we propose to 
address the problem by using deformation maps to model 
geometric changes along epipolar lines. Specifically, given 
two images and an estimated fundamental matrix, our al¬ 
gorithm seeks to compute a geometric map that relates the 
images and satisfies two requirements; First, it should re¬ 
spect the epipolar constraints, and, secondly, we bound the 
amount of distortion that the mapping can exert locally. 
We refer to such a map by epipolar consistent bounded- 
distortion (EBD) map. Our core theoretical contribution is 
in showing that, while the set of maps whose distortion is 
bounded is non-convex, its intersection with maps that sat¬ 
isfy the epipolar constraints (with an ordering assumption 
[2]) is convex, allowing us to introduce an efficient match¬ 
ing algorithm. 

Bounded distortion (BD) maps are continuous, locally 
injective transformations whose conformal distortion at ev¬ 
ery point (defined as the condition number of their Jaco¬ 
bian matrices) is bounded. Intuitively, the conformal dis¬ 
tortion measures how different the local map is from a sim¬ 
ilarity transformation, i.e., how much local aspect ratio is 
changed. Bounding the conformal distortion is motivated 
by the following observation. Suppose two cameras are set 
so that their image planes are parallel (including as special 
case rectified setups). Eor any fronto-parallel plane it can 
be readily verified that its projections onto the two image 
planes are related by a similarity transformation. Therefore 
such projections undergo no distortion. Bounding the dis¬ 
tortion in these setups therefore limits the slant and tilt of 
the recovered planes. 

To formulate our solution we define a cost function that 
seeks an EBD map that maximizes the number of matches. 
We optimize this robust objective using majorization- 
minimization. The use of a robust objective allows us to 
recover when certain portions of the images are distorted 
beyond the bounds allowed by our algorithm or when the 
set of initial correspondences include outliers. 

We have tested our method on datasets containing pairs 
of images with ground truth matches and compared it to 
several state-of-the-art methods. Our method consistently 
outperformed these methods. 
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2. Previous work 

The problem of wide baseline stereo matching has been 
approached by a number of studies. Considerable effort 
has been put into designing better features and descrip¬ 
tors and into utilizing them to estimating the fundamental 
matrix. Several studies have used affine invariant features 
[29, 31]. A wide variety of alternatives to the SIFT descrip¬ 
tor [22] have been proposed, emphasizing speed (e.g, the 
Daisy descriptor [27]) or invariance to extreme transforma¬ 
tions such as scale changes [12]. Other studies have utilized 
line segments [4] and regional features (e.g., MSER [13] 
and texture-based descriptors [24]). [23] groups coplanar 
points by identifying homographies and uses them to esti¬ 
mate epipolar lines. A few of those descriptors were de¬ 
signed to also account for occlusion (e.g., [27, 28]). Finally, 
a number of studies have approached the problem from a 
multiview perspective [25, 9]. 

Relevant to our work also are generic methods for robust, 
dense matching, based on a variety of point-feature and re¬ 
gional descriptors, such as the SIFT-flow [21, 20], patch- 
match [3], NRDC [11], LDOF [7] and, more recently, SPM 
[14], as well as models of deformation (e.g., [5, 8, 16]), 
which can potentially be applied in a wide baseline setting. 
Another recent study [17] proposed an algorithm for mo¬ 
saic stitching by finding a map that smoothly departs from 
a global affine transformation. Our experiments include 
comparison to [16] and [20] modified to seek matches near 
corresponding epipolar lines. We show the results of our 
method are superior to these methods even despite these 
modifications, suggesting that our global model of defor¬ 
mation provides a more suitable model for wide baseline 
stereo. 

Our model of deformation maps is derived from the work 
of [18], that proposed an approach for optimizing func¬ 
tionals over bounded distortion transformations using se¬ 
quences of convex optimization. [19] further used this ap¬ 
proach for robust feature matching in general pairs of im¬ 
ages (analogous to RANSAC [10], but allowing many de¬ 
grees of freedom). Our work shows that the set of FED 
maps are convex, allowing us to introduce an efficient algo¬ 
rithm that is less sensitive to initialization. 

3. Method 

In this section we describe our algorithmic approach to 
the problem of wide baseline image matching. We assume 
we are given two images I, J C K^, with their funda¬ 
mental matrix F either supplied as input or computed au¬ 
tomatically, e.g., using RANSAC [10]. Our goal is to find 
a map $ from I to J that relates corresponding points in 
the two images; i.e., for every pair of corresponding points, 
(Pi q) G f X J, the desired map satisfies $(p) = q. We start 
with a large set of candidate corresponding pairs of points 


(Pm,qm) € / X J, m = l,...,n. Then, we search for a 
map $, from the family of epipolar /r-bounded distortion 
mappings (defined below) that matches as many pairs 
as possible. Specifically, we aim at optimizing 

n 

min V ||$(pm) - q^ll” (la) 

m—1 

s.t. $ e (lb) 

where for v S the norm Ij-H® is defined by; IjvH® = 1 
if V 7^ 0, and IjvH® = 0 otherwise. The optimization prob¬ 
lem (1) strives to maximize the number of matched pairs 
under the deformation model. This can be seen by not¬ 
ing that the energy (la) counts how many pairs (Pmj Pm) 
are not matched by $. Similarly to [19], we solve (1) 
by: 1) computing a set of candidate pairs of correspon¬ 
dences (pm,qm); and 2) optimizing (1) using an iterative 
re-weighted least-squares (IRLS) approach. However, dif¬ 
ferently from previous work, we devise a novel formulation 
of the Bounded Distortion deformation model that is shown 
to be convex when matching images under the epipolar con¬ 
straint. The convex model facilitates the optimization of (1), 
allows considerably faster optimization times, incorporates 
epipolar constraints, and does not require any particular ini¬ 
tialization or convexification. We explain the deformation 
model next. 

3.1. Convex Epipolar BD Deformations 

At the core of our method is a convex characterization 
of the space T)^ of epipolar BD deformations. In a nut¬ 
shell, is a one parameter family of non-rigid deforma¬ 
tions that allow bounded amount of distortion and respect 
epipolar constraints. To formulate we introduce a trian¬ 
gulation T = (V, F) on image I, where V = {v i]Fl 
is the vertex set, 8 = {cij} the edge set, and F = {fijk} 
the triangles (faces). 

A mapping $ G is represented by prescribing new 
locations to the vertices of the triangulation in the second 
image, V = {vi} C J. The mapping $ is defined as the 
unique piecewise-linear (PL) mapping satisfying $(vj) = 
Vi. We denote by ^ijk = the affine map of the re¬ 

striction of $ to the triangle fijk G F. 

Using the entire collection of PL mappings {$} defined 
on a triangulation T is way too general as every vertex is 
allowed to move arbitrarily and in the context of stereo this 
will allow unreasonable geometries to be considered. In¬ 
stead, we will restrict our attention to a one parameter fam¬ 
ily of mapping spaces that translate to a reasonable as¬ 
sumption of the scene’s geometry. In particular, in addition 
to imposing epipolar line constraints, we suggest to bound 
the deviation of the affine maps ^ijk from similarity trans¬ 
formations using a parameter 0 < p < 1 as is defined be¬ 
low. We next derive this constraint for a single affine trans- 


formation and later show how to set the constraints for the 
entire triangulation T to define T)^. 

3.1.1 Epipolar Bounded-Distortion affine map 

We now focus on a single affine map. A general planar 
affine map can be written uniquely as 

/(x) = Bx + Cx + t (2) 

where. 



are a similarity matrix, an anti-similarity matrix (i.e., a 
reflected similarity), and a translation vector, respectively 
[18, 19]. The ratio of Frobenious norms of the anti¬ 
similarity and similarity parts, i.e., 

||C|| Ic-^ + dP 
~ ||B|| “ Va2+62 

provides a natural scale-invariant measure for deviation of 
/ from a similarity. In fact. 


is the conformal distortion of the affine map, which equals 
the ratio of the maximal singular value to the minimal sin¬ 
gular value {i.e., the condition number) of the linear part of 
the affine map, B + C. We therefore set the /r-Bounded 
Distortion constraint, 

p/ < /r (3) 

where as mentioned above 0 < /r < 1 is a parameter of the 
deformation space. We note that an affine map satisfying 
(3) is also orientation preserving since 2^/^ det(i? + C) = 
\\Bf - ||Cf andO < p < 1. 

The Bounded-Distortion constraint (3) is not convex and 
requires some convexihcation to work with in practice [18]. 
However, surprisingly, it becomes convex when we inter¬ 
sect this constraint with the epipolar line constraints (as¬ 
suming epipolar line pairs can be oriented, as we explain 
below). More generally, when the affine map / is known 
to map some directed line ii (e.g., epipolar line) to another 
directed line £ 2 , while preserving the direction, then Eq. (3) 
can be formulated as a convex constraint in B,C, see Figure 
1 for an illustration. We summarize this in a Proposition, 

Proposition 1 The collection of fi-Bounded-Distortion pla¬ 
nar affine transformations that map a directed line £i to an¬ 
other directed line £2 is convex. 


We start by proving the proposition for the case that the di¬ 
rected lines both coincide with the AT-axis with the positive 
direction, 

£i = £2 = £ = spanjei} 

where ei = (1,0)^. By assumption we have in particular 
that /(0),/(ei) G £ and eJ'f(O) < ej'f(ei). This implies 
that 

e^t = 0, d = b, a-|-c>0 (4) 

where 62 = (0,1)^. Plugging this into (3), squaring and 
rearranging we get 

(1 — -I-(5) 

If we show that a > 0 then taking the square-root of both 
sides of (5) leads to a (convex) second-order cone (SOC) 
constraint, 

\/(l — pf)})^ < pa. (6) 

Indeed, since a -f c > 0 and (5) implies that |a| > |c| we 
must have a > 0. We have therefore shown that any affine 
map (2) that satishes the assumption (3) and maps the real 
axis £ to itself by preserving the positive direction has to sat¬ 
isfy (4) and (6). In the other direction, any non-zero affine 
map that satisfy (4) and (6) maps £ to itself while preserving 
the positive direction (since a -f c > 0) and satisfies (3). 

For general directed lines £±,£2 we can represent any 
affine map f* satisfying the assumptions of Proposition 1 
as 

f*=92ofogf^ (7) 

where gi, % = 1,2, are similarities that map the X-axis 
£ (with positive direction) to £i, and / is p-Bounded- 
Distortion that maps £ to itself while preserving the posi¬ 
tive direction as above. Note that this change of coordinates 
does not change the distortion p/ of the affine map. There¬ 
fore, the collection {/*} of all affine maps satisfying the 
assumption of the proposition with general lines is convex. 

The consequence of this proposition is that the set of 
p-bounded distortion affine transformations that map an 
epipolar line in one image to an epipolar line in another 
image is convex, provided that the pair of epipolar lines can 
be oriented. Consider a pair of epipolar lines £\ and £ 2 - It 
can be readily shown that any planar patch in 3D whose 
front size is visible to both cameras will project to £\ and £2 
with consistent orientation. We note however that for more 
general scene structures orientation may not always be pre¬ 
served. Still, many stereo algorithms assume ordering (dat¬ 
ing back to [2]). We therefore conclude with the following 
corollary: 

Corollary 1 The collection of p-Bounded-Distortion pla¬ 
nar affine transformations that map a directed epipolar line 
£i to another directed epipolar line £2 is convex. 







Figure 1. Epipolar Bounded-Distortion affine mapping. 


3.1.2 Mappings of triangulations 

We use the results of the previous subsection to formulate 
our convex mapping space where each of its members, 
$ S is a piecewise linear map whose restriction to a 
triangle S -F is an affine map ^ijk- Let us denote 




The coefficient of this affine map Bijk, Cijk, and are 
all linear functions of the degrees of freedom V (i.e., the 
mapped vertices) of the mapping <!> as follows, 


[^ijk ^ijk I fzjfc] —[vi 


Vi Vj Vfe 

1 1 1 


(8) 


where here Vi,Vi G are viewed as vectors in the 

plane. Note that the inverted matrix (the rightmost matrix 
in (8)) is constant as it only depends on the source triangula¬ 
tion’s vertices V. Therefore, if the triangle fijk has an edge 
on an epipolar line £i, we can set £2 = with F being the 
Fundamental matrix and combine (8) with (7), (6) and (4) 
to constrain $ijfe to be p-Bounded Distortion and to respect 
the epipolar constraint £i —t £ 2 - See Figure 1 for an illus¬ 
tration. For the third vertex of fijk (shown in red) we can 
impose its epipolar constraint by adding the suitable linear 
equation. Adding these equations for all triangles tijk G F 
(one SOC and a few linear equality constraints per trian¬ 
gle) results in a convex SOCP realization of the space of PL 
mappings with a single distortion parameter /i G (0,1). 


3.1.3 Triangulating the source image 

In order to construct we require a triangulation T = 
{V,£,F) with the property that each triangle has an edge 
on an epipolar line £i of image L We call such a T an 
epipolar triangulation. We construct such a triangulation 
by placing an equispaced grid of distance rj over a polar co¬ 
ordinate frame centered at the epipole (we used 77 = 25 pix¬ 
els). For each triangle we enforce its edges to coincide with 
the appropriate epipolar lines by applying constrained De¬ 
launay triangulation is non-empty. We only keep triangles 
whose intersection with the image. Figure 2 depicts an ex¬ 
ample. We further determine the orientations of the epipo¬ 
lar lines. This can be done simply by recovering projective 
camera matrices from the fundamental matrix F and testing 
the orientation induced, say, by the Z = const plane. 



Figure 2. Example of an epipolar triangulation of an image. For 
illustration purposes we show coarse triangles. 


3.2. Optimization 

To optimize (1) we hrst use a simple modification of 
SIFT [22] to find candidate pairs of corresponding points 
(Pm, Qm) that satisfy the epipolar constraint. If the funda¬ 
mental matrix F is not provided we use standard SIFT and 
RANSAC to first estimate F. 

Next, we optimize (1) using IRLS combined with con¬ 
vex epipolar p-Bounded Distortion constraints. Assuming 
a hxed list of pairs (pm, Qm). we reformulate (1) as 

n 

min V pp,E(||h„||) (9a) 

m—1 

s.t. = ^(Pm) - Qm (9b) 

$ G (9c) 

where G are auxiliary variables, and the functions 
will be dehned soon. The map T* is represented by the 
images of the vertices of the triangulation T, that is {v^}. 
Namely, each vertex is mapped to a new (unknown) lo¬ 
cation in the second image G J, and $ is the unique 
piecewise linear interpolation 4)^^ over the triangles fijk, 
as described in Section 3.1.2. The unknowns in the opti¬ 
mization problem (9) is therefore the target vertex locations 

{w}- 

The constraint (9b) is set for every m by hnding the tri¬ 
angle fijk containing p^ and encoding p^ in barycentric 
coordinates of the comers v^, 'Vj,Wk of that triangle, namely 
Pf = Cm.i'Vi + + Cm.k'Vk, where the barycentric 

weights satisfy Cm,i,Cm,j,Cm,k > 0 and Cm,i + Cmj + 
Cm,k = 1- (9b) then becomes 

hyn — T T Cm,k^k Bm- (10) 
The EBD constraint (9c) is set by adding Equations 
















(8),(7),(6) and (4) for every triangle fijk G T of the tri¬ 
angulation T. Note that (6) is a second order cone, and the 
rest of the equations are linear equalities and inequalities. 

Lastly, optimizing the energy (9a) w.r.t. $ requires to 
cope with the non-convexity and non-smoothness of the en¬ 
ergy (la). The IRLS point of view suggests replacing the 
zero norm with its approximations 


9pA'^) = 


r > e 


E^p 2^2 gP 0 < r < e 


( 11 ) 


The pp e functions are smooth (C^) and converge to r° 
as p, e — )• 0. For a fixed p, e, (9a) is optimized itera¬ 
tively by replacing gpAA a convex quadratic func¬ 
tional called majorizer, Gp^e{r, s), with the properties that 
Gp,e{s,s) = gp,e{s), and Gp^e{r,s) > gp,e{r), for all r. 
These two properties guarantee that the IRLS monotoni- 
cally reduces the energy in each iteration. The majorizers 
Gp^e are similar to those in [6], 


Gp,e{r,s) 


2gP 2p2 _|_ ^2^ 

EgP-2p2 _|_ ^2 



s > e 
0 < s < e 


( 12 ) 


Replacing 5 p,e(||h™||) in (9a) with C?p,e(||h„|| , ||h(„||), 
where h'^ = $'(pm) — Qm, and is the map found at the 
previous iteration, results in the following convex quadratic 
energy in hm (remember that are constants), 

n 

m—l 

S.t. hm = ^(PTn) - Qm (13b) 

$ e (13c) 

where w{s) = max{s,e}2’“^ is constant at each iteration. 
In view of (10) this implies a convex quadratic energy in the 
unknowns {v^}. We iteratively solve this problem, updating 
h', in each iteration until convergence. Each iteration is 
a convex Second Order Cone Program (SOCP) and is solved 
using MOSEK[l]. 

In practice, we fix p = 0.001 and e to be the diameter of 
image I and solve the above IRLS. Upon convergence, we 
update e = 0.5e and repeat. We continue this until e = 1 
(pixels). This heuristic of starting from a large e and de¬ 
creasing it helps avoiding local minima of the energy (la) 
as the larger the e the more convex the problem is; for ex¬ 
ample, for sufficiently large e the global minimum of (9) 
lies in the convex (quadratic) part of all terms pp ^ and can 
be found by a single SOCP. Our algorithm is summarized in 
Algorithm 1. 


4. Experiments 

Datasets. We evaluate our method by applying the op¬ 
timization algorithm presented in Sec. 3 to pairs of images 


Algorithm 1 

Require: Two images I and J, Eundamental matrix F, dis¬ 
tortion bound pL, Triangle edge length rj, and a bound on 
the Sampson Distance <5 

1 : //Eind putative matches 

{(Pm, Qm)} = EpipolarSIFT(/, J, F, S) 

2: 11 Epipolar triangulation of I according to F 
(Section 3.1.3) 

T = DelaunayTriangulation(/, C'onsframfs(F), p) 
3: Compute barycentric coordinates for {pm} (10) 

4: // Optimization 

p = 0.001, e = diameter(/); 

5: V?7i, — Pm r[m 

6: while e < 1 do 
7: while Not converged do 

8 : Solve Eq. (13) using SOCP solver, obtaining $ 

9: Vto, h.'^ = $(Pm) - 

10: end while 

11: e = e/2 

12: end while 

13: return A subset of matched points {(Pmi 7 9lmi)} and a 
map $ 


from the dataset of [26]. The dataset contains two multi¬ 
view collections of high-resolution images (2048 x 3072), 
referred to as “Herzjesu” and “Eountain,” provided with 
ground truth depth maps. The Herzjesu dataset contains 8 
images and the Eountain dataset contains 11 images. There¬ 
fore, in total there are 83 stereo pairs with varying distances 
between focal points. We tested each pair twice, seeking 
a map from the left image to the right one and vice versa, 
obtaining 166 matching problems. 

Eor evaluation we further process the ground truth depth 
values to obtain ground truth matches. Specifically, for each 
dataset we employ ray-casting (z-buffering) to the 3D sur¬ 
face, obtaining ground-truth correspondences at sub-pixel 
accuracy. We further used ray casting to determine an oc¬ 
clusion mask and excluded those pixels (for the left image) 
from our evaluation. (These masks of course are not known 
to the algorithm and used only for evaluation.) 

Our optimization algorithm can work in reasonable 
run-times (roughly 5 minutes) when applied to the high- 
resolution images. However, in order to compare to state- 
of-the-art algorithms, which are considerably slower at 
those resolutions, we use the lower-resolution (308 x 461) 
suggested in [27, 28]. We do not rectify the images or apply 
any other pre-processing. 

Epipolar SIFT. Our algorithm takes as input pairs of pu¬ 
tative correspondences and builds an EBD map that is con¬ 
sistent with as many of the input matches as it can, Eor the 






experiments we used SIFT matches (using the VLFeat soft¬ 
ware package [30]). Classical SIFT matching seeks puta¬ 
tive matches throughout the entire image domain. As we 
assume that epipolar geometry is known (either exactly or 
approximately), we modify the matching procedure as fol¬ 
lows. Given a SIFT descriptor at location p in the left im¬ 
age, we restrict the search for a putative match, q, to the area 
close to the corresponding epipolar line in the right image. 
This area is determined by limiting the Sampson distance 
between p and q, i.e. 


_ (q^fp)^ _ 

(Fp)f -f (Fp)i + (i^^q)f + (F^q)i 


(14) 


where F is the fundamental matrix, p and q are written in 
homogeneous coordinates, and {Fp)i denotes the entry 
of the vector Fp. We further accept a match (p, q) if its 
SIFT score is at least twice higher than the score of (p, q') 
for any q' within Sampson distance 6. We set 6 to 5. Fig. 3 
shows an example of the putative matches obtained using 
the classical methodology of SIFT, while Fig. 4 demon¬ 
strates the the putative matches obtained with the described 
methodology, Epipolar SIFT. In these pictures the images 
are presented side-by-side with the color of the markers cor¬ 
responding to the value of the a:—coordinate and the size of 
the marker corresponds to the value of y—coordinate. It 
is evident that the set of putative matches obtained with 
Epipolar SIET is reacher than that obtained with the clas¬ 
sical method. 


Algorithms for evaluation. We compare our method to 
the following algorithms: 

1. BD: Eeature matching by bounded-distortion sug¬ 
gested by Lipman et al. [18]. This method serves as 
baseline to our method since it seeks correspondences 
consistent with a bounded distortion transformation, 
but does not take epipolar constraints into account. 

2. Spectral: The spectral technique of Leordeanu and 
Hebert [16]. This method uses graph methods to find 
point matches by minimizing pairwise energies. 

3. SiftFlow: by Liu et al. [20], which finds dense cor¬ 
respondence by minimizing an MRE energy whose 
unary term measures the match between SIET descrip¬ 
tors, 

4. Homography: Mapping by looking for the best ho- 
mography (computed with RANSAC [10]) 

5. Stereo: by Lee et al. [15], which finds dense corre¬ 
spondence between the images after rectification. 

We note that the algorithms of [16] and [20] were not de¬ 
signed specifically for wide baseline stereo. Eor a fair com¬ 
parison we therefore tested those algorithms in two settings. 


first in their original (unrestricted) setting, and secondly in 
a setting that integrates the knowledge of epipolar geome¬ 
try into the algorithms. The latter is achieved as follows. 
Eor [16] we used a version of the algorithm that allows it to 
select from a candidate set of matches that were either ex¬ 
tracted from the entire image (for the unrestricted setting) or 
from the epipolar SIET matches (i.e., the same input given 
to our algorithm). Eurthermore, since this algorithm does 
not compute a map (it only return a sparse set of matches) 
we further applied cubic interpolation to extend the matches 
to the entire image. Eor [20] we modified the code to allow 
only maps on or close to corresponding epipolar lines (we 
set the Sampson distance to 2, which gave the best result). 
Einally, for homography we used putative matches obtained 
with the epipolar SIET and for the stereo algorithm we used 
ground truth matches to perform the rectification. 

Results. Eigures 5 and 6 show an example for the re¬ 
sults obtained with our method. The figures show respec¬ 
tively the set of correspondences {pm, Qm)} and the map 
$ returned by our optimization. To further evaluate the 
map computed with our algorithm for the entire dataset, we 
checked for each tested pair of images I and J all pixels 
in I after masking it with the ground truth occlusion map. 
Eor each non-occluded pixel p we measured the Euclidean 
distance || 4)(p) — q||, where q is the ground truth point cor¬ 
responding to p. We then produced a cumulative histogram 
depicting the fraction of non-occluded points in I against 
their displacement error from the ground truth target posi¬ 
tion. In Eigures 7 and 8 we report for each error value the 
median number of points that achieved this error or less over 
all pairs of images. Table 1 further shows the median frac¬ 
tion of non-occluded pixels that were mapped to a 1 pixel 
accuracy by our map <I>. We show our results both with an 
exact fundamental matrix (obtained from ground truth) and 
with an approximated one (computed with RANSAC [10] 
using classical SIET). Our results are further compared to 
Spectral [16], SiftElow [21] (both with and without epipo¬ 
lar constraints), to homography estimation and to classical 
stereo estimation. (To simplify the table we only include re¬ 
sults for the epipolar-enhanced algorithms.) As can be seen 
from the figures and the table our method outperformed all 
the tested methods on both datasets with both an exact and 
an approximate fundamental matrix. We note further that 
for all algorithms there was no marked difference between 
the use of exact and approximate fundamental matrix (solid 
lines vs. dashed) and all methods benefited from incorpo¬ 
rating epipolar constraints (compare to dotted lines, for non 
restricted version). 

Eigures 9 and 10 further show a breakdown according to 
the length of the baseline. Eor this figure we considered in 
each of the two datasets all pairs li and 7;+^ for each value 
k (between 1 and 7 for Herzjesu and between 1 and 10 for 




Figure 3. Putative matches obtained with the classical SIFT al¬ 
gorithm, which seeks matches over the entire image. The figure 
shows images 7 and 3 from the Fountain dataset. 



Figure 4. Putative matches obtained with Epipolar SIFT. In this 
case the search for matches is restricted by the Sampson distance 
to the immediate surroundings of the corresponding epipolar line. 
It is evident that the set of putative matches is richer than that 
obtained with the matching algorithm. Fig. 3. 



Figure 5. Matches {(pm, qm)} obtained with our EBD solver. 


Figure 6. The map obtained with our EBD solver. 

Fountain). For each such set of pairs we counted the num¬ 
ber of pixels mapped by our computed map $ with error 
< 1 pixel and ploted the median of these numbers. As ex¬ 
pected the closer together pairs are, the better our method is. 
Compared to the other methods our method seem to achieve 
superior accuracy in almost all conditions. 

For a pair of images in this dataset our algorithm runs in 
100 seconds on a 3.50 GHz Intel Core i7. This is compared 
to 400 seconds required for the non-convex BD of [18]. In 
general, running the non-convex BD with features restricted 
to epipolar lines is significantly slower and achieves slightly 



Figure 7. The percent of pixels mapped by each method to within 
an error specified on the horizontal axis from their ground truth 
target location, for all pairs of images. Median computed for all 
pairs in the Herzjesu dataset. 



Figure 8. The percent of pixels mapped by each method to within 
an error specified on the horizontal axis from their ground truth 
target location, for all pairs of images. Median computed for all 
pairs in the Fountain dataset (legend of Fig. 7 applies here). 


Algorithm 

Fountain 

Herzjesu 

EBD (ours), exact F 

54.77 

69.11 

EBD (ours), approx F 

51.65 

68.28 

Spectral, exact F 

47.70 

56.13 

Spectral, approx F 

44.40 

56.70 

SiftFlow, exact F 

32.44 

47.45 

SiftFlow, approx F 

32.19 

47.97 

Homography, exact F 

27.40 

39.95 

Stereo, exact F 

26.84 

34.89 


Table 1. The percent of pixels mapped by each method to within 
one pixel from their ground truth target location. Median com¬ 
puted for all pairs of images in the Fountain and Herzjesu datasets. 


inferior results. 






















































Figure 9. Performance as a function of baseline. The graphs shows 
the percent of pixels mapped by each method to within one pixel 
from their ground truth target location plotted against frame dif¬ 
ference in the sequence for the Herzjesu dataset. 



Figure 10. Performance as a function of baseline. The graphs 
shows the percent of pixels mapped by each method to within one 
pixel from their ground truth target location plotted against frame 
difference in the sequence for the Fountain dataset. 
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